128 research outputs found

    A Data-Oriented Model of Literary Language

    Get PDF
    We consider the task of predicting how literary a text is, with a gold standard from human ratings. Aside from a standard bigram baseline, we apply rich syntactic tree fragments, mined from the training set, and a series of hand-picked features. Our model is the first to distinguish degrees of highly and less literary novels using a variety of lexical and syntactic features, and explains 76.0 % of the variation in literary ratings.Comment: To be published in EACL 2017, 11 page

    Annotation and Prediction of Movie Sentiment Arcs

    Get PDF
    Some narratologists have argued that all stories derive from a limited set of archetypes. Specifically, Vonnegut (2005) claims in his Shapes of Stories lecture that if we graph the emotions in a story over time, the shape will be an instance of one of six basic story shapes. The work of Jockers (2015) and Reagan et al. (2016) purports to confirm this hypothesis empirically using automatic sentiment analysis (rather than manual annotations of story arcs) and algorithms to cluster story arcs into fundamental shapes. Later work has applied similartechniques to movies (Del Vecchio et al., 2019). This line of work has attracted criticism. Swafford (2015) argues that sentiment analysis needs to be validated on and adapted to narrative text. Enderle (2016) argues that the various methods to reduce story shapes to the putative six fundamental types are actually producing algorithmic artifacts, and that random sentiment arcs can also be clustered into six “fundamental” shapes.In this paper I will not attempt to find fundamental (or even universal) story shapes, but I will take the observed story shape for each narrative as is, without trying to cluster them into archetypes. My aim is to perform an empirical validation of how well basic sentiment analysis tools can reproduce a sentiment arc obtained through manual annotation based on narrative text. Rather than considering novels as narratives, I consider movies, since the annotation ofmovies, when done in real time, is less time consuming. In a previous abstract, I considered the task of predicting the annotated sentiment of individual sentences from movie scripts (van Cranenburgh, 2020), and concluded that sentiment analysis tools achieve comparable performance on narrative text as compared to reviews and social media text (pace Swafford 2015). In this abstract I consider the task of predicting the overall sentiment as annotated based on watching the movie. This task is more challenging since the connection between the narrative sentiment and the narrative text is potentially more distant

    Machine Learning Literature using Textual Features

    Get PDF
    Literature is hard to define. The value-judgment definition holds that literature is a highly valued kind of writing [2, p. 9], but how arbitrary or predictable are such judgments? Moreover, some believe that critics and publishers wield more influence than the text itself [1]. We investigate these questions with a computational model of literature trained on texts. As part of The Riddle of Literary Quality (http://literaryquality.huygens.knaw.nl), an online survey (14k respondents) was conducted among the general public to collect judgments on 401 recent, bestselling Dutch novels. Given a list of author-title pairs, respondents rated novels they had read on a 7-point scale from definitely not to highly literary. We consider the regression task of predicting the mean rating of each novel using features extracted from its text. We train a linear support vector regression model on frequencies of bigrams and syntactic features. The syntactic features consist of tree fragments mined from trees obtained by automatically parsing the novels. Our predictive model explains 57.5 % of the variance in literary ratings, with a root mean squared error of 0.65 on a scale of 0–7 (evaluation based on 5-fold cross-validation with the 401 novels).This is in line with pilot experiments with a subset of the novels and only bigrams [3]. Although the bigrams form a simple, strong baseline, the syntactic features are more interpretable. We conclude that perceptions of literary ratings can be explained to a large extent from the text itself: there is an intrinsic literariness to literary texts
    • …
    corecore